Goto

Collaborating Authors

 local curvature


Diffusion Curvature for Estimating Local Curvature in High Dimensional Data

Neural Information Processing Systems

We introduce a new intrinsic measure of local curvature on point-cloud data called diffusion curvature. Our measure uses the framework of diffusion maps, including the data diffusion operator, to structure point cloud data and define local curvature based on the laziness of a random walk starting at a point or region of the data. We show that this laziness directly relates to volume comparison results from Riemannian geometry. We then extend this scalar curvature notion to an entire quadratic form using neural network estimations based on the diffusion map of point-cloud data. We show applications of both estimations on toy data, single-cell data, and on estimating local Hessian matrices of neural network loss landscapes.


Diffusion Curvature for Estimating Local Curvature in High Dimensional Data

Neural Information Processing Systems

We introduce a new intrinsic measure of local curvature on point-cloud data called diffusion curvature. Our measure uses the framework of diffusion maps, including the data diffusion operator, to structure point cloud data and define local curvature based on the laziness of a random walk starting at a point or region of the data. We show that this laziness directly relates to volume comparison results from Riemannian geometry. We then extend this scalar curvature notion to an entire quadratic form using neural network estimations based on the diffusion map of point-cloud data. We show applications of both estimations on toy data, single-cell data, and on estimating local Hessian matrices of neural network loss landscapes.


Local Curvature Smoothing with Stein's Identity for Efficient Score Matching

Osada, Genki, Shing, Makoto, Nishide, Takashi

arXiv.org Artificial Intelligence

The training of score-based diffusion models (SDMs) is based on score matching. The challenge of score matching is that it includes a computationally expensive Jacobian trace. While several methods have been proposed to avoid this computation, each has drawbacks, such as instability during training and approximating the learning as learning a denoising vector field rather than a true score. We propose a novel score matching variant, local curvature smoothing with Stein's identity (LCSS). The LCSS bypasses the Jacobian trace by applying Stein's identity, enabling regularization effectiveness and efficient computation. We show that LCSS surpasses existing methods in sample generation performance and matches the performance of denoising score matching, widely adopted by most SDMs, in evaluations such as FID, Inception score, and bits per dimension. Furthermore, we show that LCSS enables realistic image generation even at a high resolution of $1024 \times 1024$.


Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization

He, Yuhao, Tian, Jinyu, Zheng, Xianwei, Dong, Li, Li, Yuanman, Zhou, Jiantao

arXiv.org Artificial Intelligence

Recent studies have shown that deep learning models are very vulnerable to poisoning attacks. Many defense methods have been proposed to address this issue. However, traditional poisoning attacks are not as threatening as commonly believed. This is because they often cause differences in how the model performs on the training set compared to the validation set. Such inconsistency can alert defenders that their data has been poisoned, allowing them to take the necessary defensive actions. In this paper, we introduce a more threatening type of poisoning attack called the Deferred Poisoning Attack. This new attack allows the model to function normally during the training and validation phases but makes it very sensitive to evasion attacks or even natural noise. We achieve this by ensuring the poisoned model's loss function has a similar value as a normally trained model at each input sample but with a large local curvature. A similar model loss ensures that there is no obvious inconsistency between the training and validation accuracy, demonstrating high stealthiness. On the other hand, the large curvature implies that a small perturbation may cause a significant increase in model loss, leading to substantial performance degradation, which reflects a worse robustness. We fulfill this purpose by making the model have singular Hessian information at the optimal point via our proposed Singularization Regularization term. We have conducted both theoretical and empirical analyses of the proposed method and validated its effectiveness through experiments on image classification tasks. Furthermore, we have confirmed the hazards of this form of poisoning attack under more general scenarios using natural noise, offering a new perspective for research in the field of security.


Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator

Levada, Alexandre Luís Magalhães, Nielsen, Frank, Haddad, Michel Ferreira Cardia

arXiv.org Artificial Intelligence

The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular methods for nonparametric classification. However, a relevant limitation concerns the definition of the number of neighbors $k$. This parameter exerts a direct impact on several properties of the classifier, such as the bias-variance tradeoff, smoothness of decision boundaries, robustness to noise, and class imbalance handling. In the present paper, we introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. The rationale is that points with low curvature could have larger neighborhoods (locally, the tangent space approximates well the underlying data shape), whereas points with high curvature could have smaller neighborhoods (locally, the tangent space is a loose approximation). We estimate the local Gaussian curvature by computing an approximation to the local shape operator in terms of the local covariance matrix as well as the local Hessian matrix. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method and also another adaptive $k$-NN algorithm. This is particularly evident when the number of samples in the training data is limited, suggesting that the $kK$-NN is capable of learning more discriminant functions with less data considering many relevant cases.


From Zero to Hero: How local curvature at artless initial conditions leads away from bad minima

Bonnaire, Tony, Biroli, Giulio, Cammarota, Chiara

arXiv.org Artificial Intelligence

We investigate the optimization dynamics of gradient descent in a non-convex and high-dimensional setting, with a focus on the phase retrieval problem as a case study for complex loss landscapes. We first study the high-dimensional limit where both the number $M$ and the dimension $N$ of the data are going to infinity at fixed signal-to-noise ratio $\alpha = M/N$. By analyzing how the local curvature changes during optimization, we uncover that for intermediate $\alpha$, the Hessian displays a downward direction pointing towards good minima in the first regime of the descent, before being trapped in bad minima at the end. Hence, the local landscape is benign and informative at first, before gradient descent brings the system into a uninformative maze. The transition between the two regimes is associated to a BBP-type threshold in the time-dependent Hessian. Through both theoretical analysis and numerical experiments, we show that in practical cases, i.e. for finite but even very large $N$, successful optimization via gradient descent in phase retrieval is achieved by falling towards the good minima before reaching the bad ones. This mechanism explains why successful recovery is obtained well before the algorithmic transition corresponding to the high-dimensional limit. Technically, this is associated to strong logarithmic corrections of the algorithmic transition at large $N$ with respect to the one expected in the $N\to\infty$ limit. Our analysis sheds light on such a new mechanism that facilitate gradient descent dynamics in finite large dimensions, also highlighting the importance of good initialization of spectral properties for optimization in complex high-dimensional landscapes.


Diffusion Curvature for Estimating Local Curvature in High Dimensional Data

Bhaskar, Dhananjay, MacDonald, Kincaid, Fasina, Oluwadamilola, Thomas, Dawson, Rieck, Bastian, Adelstein, Ian, Krishnaswamy, Smita

arXiv.org Artificial Intelligence

We introduce a new intrinsic measure of local curvature on point-cloud data called diffusion curvature. Our measure uses the framework of diffusion maps, including the data diffusion operator, to structure point cloud data and define local curvature based on the laziness of a random walk starting at a point or region of the data. We show that this laziness directly relates to volume comparison results from Riemannian geometry. We then extend this scalar curvature notion to an entire quadratic form using neural network estimations based on the diffusion map of point-cloud data. We show applications of both estimations on toy data, single-cell data, and on estimating local Hessian matrices of neural network loss landscapes.


Demystify Optimization Challenges in Multilingual Transformers

Li, Xian, Gong, Hongyu

arXiv.org Artificial Intelligence

Multilingual Transformer improves parameter efficiency and crosslingual transfer. How to effectively train multilingual models has not been well studied. Using multilingual machine translation as a testbed, we study optimization challenges from loss landscape and parameter plasticity perspectives. We found that imbalanced training data poses task interference between high and low resource languages, characterized by nearly orthogonal gradients for major parameters and the optimization trajectory being mostly dominated by high resource. We show that local curvature of the loss surface affects the degree of interference, and existing heuristics of data subsampling implicitly reduces the sharpness, although still face a trade-off between high and low resource languages. We propose a principled multi-objective optimization algorithm, Curvature Aware Task Scaling (CATS), which improves both optimization and generalization especially for low resource. Experiments on TED, WMT and OPUS-100 benchmarks demonstrate that CATS advances the Pareto front of accuracy while being efficient to apply to massive multilingual settings at the scale of 100 languages.


Estimating Fingertip Forces, Torques, and Local Curvatures from Fingernail Images

Chen, Nutan, Westling, Göran, Edin, Benoni B., van der Smagt, Patrick

arXiv.org Machine Learning

The study of dexterous manipulation has provided important insights in humans sensorimotor control as well as inspiration for manipulation strategies in robotic hands. Previous work focused on experimental environment with restrictions. Here we describe a method using the deformation and color distribution of the fingernail and its surrounding skin, to estimate the fingertip forces, torques and contact surface curvatures for various objects, including the shape and material of the contact surfaces and the weight of the objects. The proposed method circumvents limitations associated with sensorized objects, gloves or fixed contact surface type. In addition, compared with previous single finger estimation in an experimental environment, we extend the approach to multiple finger force estimation, which can be used for applications such as human grasping analysis. Four algorithms are used, c.q., Gaussian process (GP), Convolutional Neural Networks (CNN), Neural Networks with Fast Dropout (NN-FD) and Recurrent Neural Networks with Fast Dropout (RNN-FD), to model a mapping from images to the corresponding labels. The results further show that the proposed method has high accuracy to predict force, torque and contact surface.